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Abstract 



Classifiers are often used to detect miscre- 
ant activities. We study how an adversary 
can efficiently query a classifier to elicit in- 
formation that allows the adversary to evade 
detection at near- minimal cost. We gener- 
alize results of Lowd and Meek (2005) to 



convex-inducing classifiers. We present al- 
gorithms that construct undetected instances 
of near-minimal cost using only polynomially 
many queries in the dimension of the space 
and without reverse engineering the decision 
boundary. 



1 INTRODUCTION 

Machine learning is often used to filter or detect mis- 
creant activities in a variety of applications; e.g., spam, 
intrusion, virus, and fraud detection. All known detec- 
tion techniques have blind spots; i.e., classes of mis- 
creant activity that fail to be detected. While learning 
allows the detection algorithm to adapt over time, con- 
straints on the learning algorithm also may allow an 
adversary to programmatically find these vulnerabili- 
ties. We consider how an adversary can systematically 
discover blind spots by querying the learner to find 
a low cost instance that the detector does not filter. 
Consider a spammer who wishes to minimally mod- 
ify a spam message so it is not classified as a spam. 
By observing the responses of the spam detector, the 
spammer can search for a modification while using few 
queries. 

The problem of near optimal evasion (i.e., finding a 
low cost negative instance with few queries) was first 



posed by Lowd and Meek ( 2005 1 . We continue this line 
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of research by generalizing it to the family of convex- 
inducing classifiers — classifiers that partition their in- 
stance space into two sets: one of which is convex. 
Convex-inducing classifiers are a natural family to ex- 
amine as they include linear classifiers, anomaly de- 



tection classifiers using bounded PCA (Lakhina et al. 



2004|), anomaly detection algorithms that use hyper- 

and other more 



2006) 



sphere boundaries ( Bishop 
complicated bodies. 

We also show that near-optimal evasion does not re- 
quire reverse engineering the classifier. The algorithm 



of Lowd and Meek (2005) for evading linear classi- 
fiers reverse-engineers the decision boundary. Our 
algorithms for evading convex-inducing classifiers do 
not require fully estimating the classifier's boundary 



(which is hard in the general case; see Rademacher 
and Goyal 2009) or reverse-engineering the classi- 



fier's state. Instead, we directly search for a minimal 
cost-evading instance. Our algorithms require only 
polynomial-many queries, with one algorithm solving 
the linear case with fewer queries than the previously- 
published reverse-engineering technique. 



Related Work. Dalvi et al. (2004) uses a cost 



sensitive game theoretic approach to patch a classi- 
fier's blind spots. They construct a modified classifier 
designed to detect optimally modified instances. This 
work is complementary to our own; we examine opti- 
mal evasion strategies while they have studied mech- 
anisms for adapting the classifier. In this work we 
assume the classifier is not adapting during evasion. 

A number of authors have studied evading intrusion 



detector systems (IDSs) (Tan et al. 2002 Wagner and 



Soto 2002 ) . In exploring mimicry attacks these au- 



thors demonstrated that real IDSs could be fooled by 
modifying exploits to mimic normal behaviors. These 
authors used offline analysis of the IDSs to construct 
their modifications; by contrast, our modifications are 
optimized by querying the classifier. 

The field of active learning also studies a form of query 
based optimization (Schohn and Cohn 2000). While 



both active learning and near-optimal evasion explore 
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optimal querying strategies, the objectives for these Lowd and Meek ( 2005 ) define minimal adversarial cost 



two settings are quite different (see Section 2.3| 



2 PROBLEM SETUP 

We begin by introducing our notation and assump- 
tions. First, we assume that instances are represented 
in D-dimensional Euclidean space X = MP . Each 
component of an instance x G X is a feature which 
we denote as x^. We denote each coordinate vector of 
the form (0, . . . , 1, . . . , 0) with a 1 only at the d th fea- 
ture as 5d- We assume that the feature space is known 
to the adversary and any point in X can be queried. 

We further assume the target classifier / belongs to a 
family J- . Any classifier / G J- is a mapping from X 
to the labels '-' and '+'; i.e., f : X ^ {'-', '+'}. We 
assume the adversary's attack will be against a fixed / 
so the learning method and the training data used to 
select / are irrelevant. We assume the adversary does 
not know / but does know its family T . 

We assume / G T is deterministic and so partitions 
X into a positive class Xf = {x G X | / (x) = '+'} 
and a negative class XJ = {x G X \ f (x) = ' — '}. 
We take the negative set to be normal instances. We 
assume the adversary is aware of at least one instance 
in each class, x G XJ and x A <E X^ , and can observe 
/ (x) for any x by issuing a membership query (this 
last assumption does not always hold in practice, see 
Section [4] for a more detailed discussion) . 

2.1 Adversarial Cost 

We assume the adversary has a notion of utility repre- 
sented by a cost function A : X i— > M° + . The adversary 
wishes to minimize A over the negative class, XJ ' ; e.g., 
a spammer wants to send spam that will be classified 
as normal email (' — ') rather than as spam ('+'). We 
assume this cost function is a distance to a positive 
target instance x A G Xj that is most desirable to the 
adversary. As with Lowd and Meek, we focus on the 
class of weighted l\ cost functions 



(1) 



where < Cd < oo is the cost the adversary associates 
with the d th feature. The ^-norm is a natural measure 
of edit distance for email spam, while larger weights 
can model tokens that are more costly to remove (e.g., 
a payload URL). We use B c (x' 4 ) to denote the ball 
centered at x A with cost no more than C. We use 
Bp (x) to refer specifically to a weighted l\ ball. 



(MAG) of a classifier / to be the value 

MAC (/, A) = inf [A(x)] . 

They further define a data point to be an e- 
approximate instance of minimal adversarial cost (e- 
IMAC) if it is a negative instance with cost no more 
than a factor (1 + e) of the MAC; i.e., every e-IMAC 
is a member of the seiQ 

e-IMAC(f, i)'{xe XJ \A(x)< (l+e)-MAC (/, A)} 

(2) 

The adversary's goal is to find an e-IMAC instance 
efficiently, while issuing as few queries as possible. 

2.2 Search Terminology 

An e-IMAC instance is multiplicatively optimal; i.e., 
it is within a factor of (1 + e) of the minimal cost. We 
also consider additive optimality; i.e., requiring a r\- 
IMAC to be no more than r\ greater than the minimal 
cost. The algorithms we present can achieve either 
criterion given initial bounds C + and C~ such that 
C + < MA C < C~ . If we can determine whether an in- 
termediate cost establishes a new upper or lower bound 
on MAC, then binary search strategies can iteratively 
reduce the t th gap between Cj and C t + . We now pro- 
vide common terminology for the binary search and in 
Section [3] we use convexity to establish a new bound 
at each iteration. 

In the t th iteration of an additive binary search, 
= Cf~ — C t + is the additive gap between the 
t th bounds. The search uses a proposal step of C t = 

C± ~^ Ct , a stopping criterion of G[ + ^ < r\ and termi- 
nates in 



L(+)=[log 2 [(C--C + )/n}] 



(3) 



steps. Binary search has the best worst-case query 
complexity for achieving 77-additive optimality. 

Binary search can be adapted for multiplicative op- 
timality: by writing C~ = 2 a and C+ = 2 b , the 
multiplicative condition becomes a — b < log 2 (l + e), 
an additive optimality condition. Thus, binary search 
on the exponent best achieves multiplicative optimal- 
ity. The multiplicative gap of the t th iteration is 
G { * ] = C^/C+. The t th query is C t = ^0^-0+, 

(*) 

the stopping criterion is G t < 1 + e and it stops in 

L<*> - [log 2 [log 2 (C-/C+) /log 2 (l + c)]l (4) 

steps. Multiplicative optimality only makes sense 
when both C~ and C + are strictly positive. 



1 Wc use the term e-IMAC to refer both to this set and 
members of it. The usage will be clear from the context. 
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For this paper, we only address multiplicative optimal- 
ly and define L = and Gt = G[*\ but note that 
our techniques also apply to additive optimality. 

2.3 Near-Optimal Evasion 



Lowd and Meek (2005) introduced the problem of ad- 



versarial classifier reverse engineering (ACRE) where 
a family of classifiers is called ACRE e-learnable if 
there is an efficient query-based algorithm for finding 
an e-IMAC. In generalizing their result, we slightly 
alter their definition of query complexity. First, to 
quantify query complexity we only use the dimension 
D and the number of steps L required by a univariate 
binary search. Second, we assume the adversary only 
has two initial points x G XT and x A G XjT (the 
original setting required a third x + G Xjt). Finally, 
our algorithms do not reverse engineer so ACRE would 
be a misnomer. Instead we call the overall problem 
Near- Optimal Evasion and replace ACRE e-learnable 
with 

A family of classifiers J- is e-IMAC searchable 
under a family of cost functions A if for all 
/ G T and A G A, there is an algorithm that 
finds x G e-IMAC(f , A) using polynomially 
many membership queries in D and L. 



A is semidefinite. The convex-inducing classifiers also 
include complicated families such as the set of all in- 
tersections of a countable number of halfspaces, cones, 
or balls. 

We construct efficient algorithms for query-based opti- 
mization of the (weighted) i\ cost of Eq. for convex- 
inducing classifiers. There appears to be an asymme- 
try depending on whether the positive or negative class 
is convex. When the positive set is convex, determin- 
ing whether £?f (x" 4 ) C XjT only requires querying the 
vertices of the ball. When the negative set is convex, 
determining whether £>f (x' 4 ) n XI — is difficult 
since the intersection need not occur at a vertex. We 
present an efficient algorithm for optimizing an li cost 
when Xj is convex and a polynomial random algo- 
rithm for optimizing any convex cost when XT is con- 
vex. 

The algorithms we present achieve multiplicative op- 
timality via binary search; we use L as the number of 
phases required by binary search, C~ = A (x _ ) as an 
initial upper bound on the MAC and assume there is 
some C + > that lower bounds the MAC (i.e., x is 
in the interior of Xj~). This condition eliminates the 
degenerate case for which x is on the boundary of 
X+ where MAC (J, A) = and e-IMAC (f, A) = 0. 



Reverse engineering is an expensive approach for near- 
optimal evasion in the general case. Efficient query- 
based reverse engineering for / G T is sufficient for 
minimizing A over the estimated negative space. How- 
ever, the requirements for finding an e-IMAC differ 
from the objectives of reverse engineering approaches 
such as active learning. Both use queries to reduce 
the size of version space J- c J~ . However reverse 
engineering minimizes the expected number of dis- 
agreements between members of J- ' . In contrast, to 
find an e-IMAC, we only need to provide a single in- 
stance G e-IMAC (f, A) for all / G P, while leav- 
ing the classifier largely unspecified. We present algo- 
rithms for e-IMAC search on a family of classifiers that 
generally cannot be efficiently reverse engineered — the 
queries we construct necessarily elicit an e-IMAC only. 

3 EVASION OF CONVEX CLASSES 

We generalize e-IMAC searchability to the family of 
convex-inducing classifiers j^ convcx that partition fea- 
ture space X into a positive and negative class, one 
of which is convex. The convex-inducing classifiers in- 
clude linear classifiers, one-class classifiers that pre- 
dict anomalies by thresholding the log-likelihood of 
a log-concave (or uni-modal) density function, and 
quadratic classifiers of the form x T Ax + b T x + c > if 



3.1 e-IMAC Search for a Convex X+ 

Solving the e-IMAC search problem when XjT is con- 
vex is hard in the general case of convex cost A (•). We 
demonstrate algorithms for the (weighted) l\ cost that 
solve the problem as a binary search. Namely, given 
initial costs C + and C~ that bound the MAC, our al- 
gorithm can efficiently determine whether 23p { xA ) C 
X^ for any intermediate cost C + < C < C~ . If 
the l\ ball is contained in Xjt , then C becomes the 
new lower bound C + . Otherwise C becomes the 
new upper bound C~ . Since our objective Eq. ^ 
is to obtain multiplicative optim ality , our steps will be 

C t -- 



C+_ 1 ■ C t ~ ! (see Section 2.2 
how we exploit the properties of the 



. We now explain 
(weighted) l\ ball 



and convexity of XT to efficiently determine whether 



CX+. 



The existence of an efficient query algorithm relies on 
three facts: (1) x A G Xj, (2) every weighted l\ cost 
C-ball centered at x A intersects XT only if at least 
one of its vertices is in XT ; and (3) C-balls only have 
2 • D vertices. We formalize the second fact as follows. 

Lemma 3.1. For all C > 0, if there exists some x G 
XT that achieves a cost of C = A (x), then there is 
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some feature d such that a vertex of the form 



c d a 



is in X, (and also achieves cost C by Eq.\l 



Algorithm 3.2. Multi-line Search 



(5) 



Proof. Suppose not; then there is some x G X7 such 
that A (x) = C and x has M > 2 features that differ 
from x A . Let {d%, ■ ■ - , <W} be the differing features 
and let b d . = sign (x d . — x A . ) be the sign of the differ- 
ence between x and x A along the cZ^-th feature. Let 
e ( i i = x A + -Q- ■ ■ Sdi be a vertex of the form of 
Eq. ([5| which has cost C (from Eq. [I]). The M ver- 
tices e di form a simplex of cost C on which x lies. If 
all e^j G , then the convexity of Xf implies that 
x G Xj~ which violates our premise. Thus, if any in- 
stance in Xr achieves cost C, there is always a vertex 
of the form Eq. ^ in XJ that also achieves cost C. □ 

As a consequence, if all vertices of any C ball Bf (x A ) 
are positive, then all x with A (x) < C are positive 
thus establishing C as a lower bound on the MAC. 
Conversely, if any of the vertices of Z?f (x A ) are nega- 
tive, then C is an upper bound. Thus, by querying all 
2-D vertices of i3f (x' 4 ) , we either establish C as a new 
lower or upper bound on the MAC. By performing a 
binary search on C we iteratively halve the multiplica- 
tive gap between our bounds until it is within a factor 
of 1 + e. This yields an e-IMAC of the form of Eq. ((5). 

A general form of this multiline search procedure 
is presented as Algorithm |3.2| which simultaneously 
searches along all unit-cost directions in the set W. 
At each step, MultiLineSearch issues at most \W\ 
queries to determine whether B^ (x" 4 ) c Xjt . Once 
a negative instance is found at cost C, we cease fur- 
ther queries at cost C since a single negative instance 
is sufficient to establish a lower bound. We call this 
policy lazy querying. Further, when an upper bound 
is established for a cost C, our algorithm also prunes 
all directions that were positive at cost C. This prun- 
ing is sound; by the convexity assumption we know 
that the pruned direction is positive for all costs less 
than our new upper bound C . Applying MultiLine- 
Search to the 2 • D axis-aligned directions yields an 
e-IMAC for any (weighted) i\ cost with no more than 
2 • DL queries but at least D + L queries. Thus the 
algorithm is O (DL). 

3.1.1 K-step Multi-Line Search 

The MultiLineSearch algorithm is 2 • D simultane- 
ous binary searches (breadth-first). Instead we could 
search sequentially (depth-first) and obtain a best case 



MLS (W,x A , 
x* <s— x~ 
while C-/C H 



> 1 + e do begin 



c- 



for all e £ W do begin 

Query classifier: fe f ( x ' 4 + Ce) 
if fe = '-' then begin 

x* «- x A + Ce 

Prune i from W if ff = '+' 

break for-loop 
end if 
end for 

if Ve G W fS = '+' then C+ <- C 

else C~ ^ C 
end while 
return: x* 



of O (D + L) and worst case of O (D ■ L) but for ex- 
actly the opposite convex bodies. We therefore pro- 
pose an algorithm that mixes these strategies. At 
each phase, the K-step MultiLineSearch (Algo- 



rithm 3.3 ) chooses a single direction e and queries it for 
K steps to generate candidate bounds B~ and B + on 
the MAC. The algorithm makes substantial progress 
without querying other directions. It then iteratively 
queries all remaining directions at the candidate lower 
bound B + . Again we use lazy querying and stop as 
soon as a negative instance is found. We show that 
for K — \vL~\ , the algorithm achieves a delicate bal- 
ance between breadth-first and depth-first approaches 
to attain a better worst-case complexity. 

To analyze the worst case of K-step MultiLine- 
Search, we consider a defender that maximizes the 
number of queries. We refer to the querier as the ad- 
versary. 

Theorem 3.4. Algorithm \3.3\ will find an e-IMAC 
with at most O (l + \/L|W|l queries for K = \\fL~] . 

Proof. During the K steps of binary search, regardless 
of how the defender responds, the candidate gap along 
e will shrink by an exponent of 2~ K ; i.e., 



B-/B+ = (C-/C+Y 



(6) 



The primary decision for the defender occurs when the 
adversary begins querying other directions than e. At 
iteration t, it has 2 options: 

Case 1 (t G Ci): Respond with '+' for all re- 
maining directions. Here the bounds B + 
and B~ are verified and thus the gap is 
reduced by an exponent of 2~ K . 

Case 2 (t G C2): Choose at least 1 direction 
to respond with '— '. Here the defender 
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Algorithm 3.3. A-Step Multi-line Search 



KMLS (W, 



,C + ,C-,e,K) 



Be) 



+ Be 



while C~ /C + > 1 + e do begin 

Choose a direction e £ W 
B + <- C+ 
B~ <- C* - 

for A steps do begin 
S <- VB+ • B- 
Query classifier: f e <— f (x 4 - 
if / e = ' + ' then B+ <- B 
else B~ S and x* » 

end for 

for all i 7^ e G W do begin 

Query classifier: fy <— f (x 
if fy = ' — ' then begin 
x* «- x- 4 + (S+)i 
Prune k from W if /k = '+' 
break for-loop 
end if 
end for 
C- «- B~ 

if Vi G W /i = ' + ' then C+ <- 

else C" <- B+ 
end while 
return: x* 



(B+)i) 



can make the gap decrease negligible but 
also must choose some number E t > 1 of 
eliminated directions. 



which is minimized by A = \\ L] . Substituting this 
for A and using L/[~vL] < \[L we have 



Q < L + (2\y/L]+l)\W\ 



□ 



As a consequence of Theorem |3.4| finding an e-IMAC 
with Algorithm 3.3 for a (weighted) l\ cost requires 

O [L + viD ) queries. Moreover, linear classifiers 



are a special case of convex-inducing classifiers for 
our A-Step MultiLineSearch algorithm. Thus K- 
step MultiLineSearch improves on the reverse- 
engineering technique's O (LD) queries and applies to 
a broader family. 

3.1.2 Lower Bound 

Here we find lower bounds on the number of queries 
required by any algorithm to find an e-IMAC when 
X^ is convex. Notably, since an e-IMAC uses mul- 
tiplicative optimality, we incorporate a lower bound 
r > on the MAC into our statement. 

Theorem 3.5. Consider any D > 0, x A e R D , 

x G R D , < r < R = A (x") and e £ (0, f - l). 
For all query algorithms submitting N < max{D, A** 1 } 
queries, there exist two classifiers inducing convex pos- 



itive classes in 



such that 



By conservatively assuming the gap only decreases in 
case 1, the total number of queries is bounded re- 
gardless of the order in which the cases are applied. 
Thus if t G Ci we have G t 
G t = G t -i. Thus 



G?_i ; otherwise we have 



icii < r#i 



(7) 



since we need a total of L binary search steps and each 
case 1 iteration does A of them. 

Every case 1 iteration makes exactly K + \Wt\ ~ 1 
queries. The size of Wt is controlled by the defender, 
but we can bound it by |W|. This and Eq. |7| bound 
the number of queries used in case 1 (Qi) by 

0i= E (K + \Wt\-l)<L + K+\jt]-(\W\-l) 

ted 

Each case 2 iteration uses exactly A + E t queries and 
eliminates E t > 1 directions. Since a case 2 iteration 
eliminates at least 1 direction, \C%\ < |W| — 1 and 
moreover, J2tec 2 ^* — 1^1 — ^ since each direction 
can only be eliminated once. Thus 

Q 2 = ^(A + A t )<(|W|-l)(A + l) , 

iGC 2 

and so the total queries used by Algorithm |3.3| is 
Q = Qi + Q2 < L + (["£] + A + 1) |W| , 



1. Both positive classes properly contain B r (x" 4 ); 

2. Neither positive class contains x~; 

3. The classifiers return the same responses on the 
algorithm's N queries; and 

4- The classifiers have no common e-IMAC. 

That is, in the worst- case all query algorithms for con- 
vex positive classes must submit at least max{D, L^} 
membership queries in order to be multiplicative e- 
optimal. 

Proof. Suppose some query-based algorithm submits 
N membership queries x 1 , . . . , x^ to the classifier. For 
the algorithm to be e-optimal, these queries must con- 
strain all consistent positive convex sets to have a com- 
mon point among their e-IMAC sets. 

First we consider the case that N > L. Then by as- 
sumption N < D. Suppose classifier / responds as 



/(x) 



+1 , if A (x) < R 
— 1 , otherwise 



For this classifier, Xj}~ is convex, B r (x" 4 ) C Xf ', and 
x ^ Xf . Moreover, since Xj~ is the open ball of cost 
R, MAC (J, A) — R. 
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Consider an alternative classifier g that responds iden- 
tically to / for x 1 , . . . , x N but has a different convex 
positive set Without loss of generality, suppose 

the first M < N queries are positive and the remain- 
ing are negative. Let Q = conv (x 1 , . . . ,x M ); that is, 
the convex hull of the M positive queries. Now let X+ 
be the convex hull of the union of Q and the r-ball 
conv (Q U B r (x' 4 )). Since Q con- 



our problem the convex set is only accessible through 
queries. We use a randomized polynomial algorithm 



of Bertsimas and Vempala ( 2004 ) to minimize the cost 



around x" 4 : X, 



tains all positive queries and r < R, the convex set Xg 
is consistent with the responses from/, B r (x" 4 ) C Xj , 
and x ^ Xj? . Further, since M < N < D, Q is con- 
tained in a proper subspace of M. D whereas B r (x" 4 ) 
is not. Hence, MAC(g, A) = r. Since the accuracy e 
is less than y — 1, any e-IMAC of g must have cost 
less than R whereas any e-IMAC of / must have cost 
greater than or equal to R. Thus we have constructed 
two convex-inducing classifiers / and g with consistent 
query responses but with no common e-IMAC. 

Second, we consider the case that N < L. First, recall 
our definitions: = R is the initial upper bound 
on the MAC, Cq — r is the initial lower bound on 
the MAC, and GJ' = C7/C t + is the gap between the 
upper bound and lower bound at iteration t. Here the 
defender / responds with 



/ (**) = 



+1, iiA(x t )<^Ct_ 1 -C+_i 
— I , otherwise 



given an initial x e Xj . For any fixed cost C we use 
their algorithm to determine (with high probability) 
whether XJ intersects with B c± (x A ); i.e., whether or 
not C is a new lower or upper bound on the MAC. 
With high probability, we find an e-IMAC in no more 
than L repetitions using binary search. 



3.2.1 Intersection of Convex Sets 

We now outline Bertsimas and Vempala's query-based 
algorithm for determining whether two convex sets 
intersect using a randomized Ellipsoid method. In 
particular V is only accessible through membership 
queries and B provides a separating hyperplane for 
any point outside it. They use efficient query-based 
approaches to uniformly sample from V to produce 
sufficiently many samples such that cutting V through 
the centroid of these samples with a separating hy- 
perplane from B significantly reduces the volume of 

V with high probability. Their algorithm thus con- 
structs a sequence of progressively smaller feasible sets 
V s C V s ~ x until either the algorithm finds a point in 

V n Q or it is highly unlikely that the sets intersect. 

Our problem reduces to finding the intersection be- 
tween XJ and Bf (x A ). Though XJ may be un- 
bounded, we can instead use V° = XJ n B\ R (x~) 



T his st rategy ensures that at each iteration G t > (where R = 2 A fx")) is a subset of XJ that envelops 



*J Gt—i and since the algorithm can not terminate un 
til G N < 1 + e, we have N > L ( *) from Eq. Q. As 
in the N > L case we have constructed two convex- 
inducing classifiers with consistent query responses but 
with no common e-IMAC. The first classifier's posi- 
tive set is the smallest cost-ball enclosing all positive 
queries, while the second classifier's positive set is the 
largest cost-ball enclosing all positive queries but no 
negatives. The MAC values of these sets differ by 
more than a factor of (1 + e) if N < so they have 
no common e-IMAC. □ 

This theorem shows that e-multiplicative optimality 
requires Q (D + L) queries. Hence if-STEP MultiLi- 



all of Bi (x' 4 ) since C* < A(x~). We also assume 
there is some r > such that an r-ball centered at x~ 
is contained in XJ . We now detail this Intersect- 



Search procedure (Algorithm 3.6 1 



The backbone of the algorithm is uniform sampling 
from a bounded convex body by means of the HIT- 
AND-RUN random walk technique introduced by[Smith 
(1996) (Algorithm 3.7). Given an instance x J € V -1 , 



hit-and-run selects a random direction v through 



x J (we return to the selection of v in Section 3.2.21 



neSearch (Algorithm 3.3 1 has close to the optimal 
query complexity. 

3.2 e-IMAC Learning for a Convex Xj 

In this section we consider minimizing a convex cost 
function A (we focus on weighted l\ costs in Eq. [lj 
when the feasible set XJ is convex. Any convex func- 
tion can be efficiently minimized within a known con- 
vex set e.g., using the Ellipsoid or Interior Point meth- 



Sincc V is a bounded convex set, the set fl = 
{uj | x J + ojv € r P s ~ 1 } is a bounded interval repre- 
senting all points in "P* -1 along direction v. Sam- 
pling uj uniformly from f2 yields the next step of the 
walk; x 3 + wv. Under the appropriate conditions 
(see Section 3.2.2), hit-and-run generates a sam- 



ple uniformly from the convex body after O* (D 



steptr ( Lovasz and Vempala 2004). 



ods (Boyd and Vandenberghe 2004). However, in 



Using hit-and-run we obtain 2N samples {x 3 } from 
V^ 1 and check if any satisfy A (x J ) < C*. If so, x- 7 is 
in the intersection of Xj and B^' (x" 4 ) . Otherwise, we 
want to significantly reduce the size of V s " 1 without 

2 0* (•) denotes O (•) without logarithmic terms. 



Nelson, Rubinstein, Huang, Joseph, Lau, Lee, Rao, TVan, Tygar 



Algorithm 3.6. Intersect Search 
Intersects earch {V u , Q = {x J £?"}, C) 
for all s = 1 . . . T do begin 

(1) Generate 2TV samples {x f }|f 1 
Choose x from Q 

x J <- HitRun {V s - 1 , Q,^ 3 ) 

(2) If any x J , A (x J ) < C terminate the for-loop 

(3) Put samples into 2 sets of size TV 
11 <- {x^}f =1 and 5 <- {x J '}|f 2JV+1 

(4) ^ S ^L„^ 

(5) Compute "H z = using Eq. pi 

(6) pvr^'n^ 

(7) Keep samples in V s 
Q^{xeSAxeP s } 

end for 

Return: the found [xj, V s , Q]\ or No Intersect 



Algorithm 3.7. Hit-and-Run Sampling 
HitRun (-P,{y j },x u ) ; 
for all i = 1 ... K do begin 

Pick a random direction: 
I/* ~ N (0, 1) 



Find tcii and u>2 s.t. 

x 1 - 1 -unv(£T and x^ 1 
repeat 

cj ~ (7m/ (— <jJi,Ua) 

x ; «- x i_1 + OJV 

if w < then u>i < ui 

else u>2 <— ui 
until x'eP 
end for 
Return: x lf 



w 2 v £ 7> 



excluding any of Bp (x" 4 ) so that sampling concen- 
trates towards the intersection (if it exists) — for this 
we need a separating hyperplane of Bp (x' 4 ) . For any 
y </ Bp (x A ), the (sub)gradient of the weighted i\ 
cost given by 



h / = c f sign ( y/ 



/ 1 



(8) 



separates y and Bp* (x A ) . 

To achieve efficiency, we choose a point z e T 5 ^ 1 so 
that cutting "P s_1 through z with the hyperplane h z 
eliminates a significant fraction of V s ~ x . To do so, z 
must be centrally located within "P*" 1 . We use the em- 
pirical centroid of half of the samples z = J2 -xfv. x 
(the other half will be used in Section 3.2.2). We 
cut V^ 1 with the hyperplane h z through z; i.e., 
-ps = n ^ z w here ft z is the halfspace 



H z = {x | x T h z < z T h z } 



(9) 



As shown by Bertsimas and Vempala, this cut achieves 
vol (V s ) < Ivol (T 3 - 1 ) with high probability if TV 



O* (D) and V s " 1 is near-isotropic (see Section 3.2.2| ). 
Since the ratio of volumes between the initial circum- 
scribing and inscribing balls of the feasible set is (— ) , 
the algorithm can terminate after T = O (D log un- 
successful iterations with a high probability that the 
intersection is empty. 



Because every iteration in Algorithm |3 ,6| requires TV = 
O* (D) samples, each of which need K = O* (D 3 ) 
random w alk steps, and there are O* (D) iterations, 
Algorithm 3.6 requires O* (D 5 ) queries. 



3.2.2 Sampling from a Convex Body 

Until this point, we assumed the hit-AND-RUN random 
walk efficiently produces uniformly random samples 
from any bounded convex body V accessible through 
membership queries. However, if the body is severely 
elongated, randomly selected directions will rarely 



align with the long axis of the body and our random 
walk will take small steps (relative to the long axis) 
and mix slowly. For the sampler to mix effectively, 
we need the convex body V to be near-isotropic; i.e., 



for any unit vector v, E x ~-p 



(v T (x-E x ^[x]))' 



bounded between 1/2 and 3/2 of vol (V). 

If the body is not near-isotropic, we can rescale X with 
an appropriate affine transformation T. With suffi- 
ciently many samples from V we can estimate T as 
their empirical covariance matrix. Instead, we rescale 
X implicitly using a technique described by Bert simas 



and Vempala (2004). We maintain a set Q of suffi- 



ciently many uniform samples from the body V s and 
in HIT-AND-RUN we sample directions based on this 
set. Because the samples are distributed uniformly in 
V s , the directions we sample based on the points in Q 
implicitly reflect the covariance structure of V s . 

We must ensure Q is a set of sufficiently many samples 
from V s after each cut: V s <— 7 5S_1 n'H z a . To do so, we 
resample 2TV points from V B ~ X using hit-and-run — 
half of these, TZ, are used to estimate the centroid z s 
for the cut and the other half, S, are used to repopu- 
late Q after the cut. Because S contains independent 
uniform samples from V s ~ l , those in V s after the cut 
constitute independent uniform samples from V s (re- 
jection sampling). By choosing TV sufficiently large, 
we will have sufficiently many points to repopulate Q. 

Finally, we also need an initial set Q of uniform sam- 
ples from V° but we only have a single point x~ € XJ . 
The RoundingBod y algorithm described by |Lovasz| 



and Vempala (2003) uses O* (D 4 ) membership queries 
to make the convex body near-isotropic. We use this as 
a preprocessing step; that is, given XT and x G X 



7 



we make V° — X f n B\ R (x ) and use the ROUND- 
ingBody algorithm to produce Q = {x 3 e V } for 
Algorithm |3.6| 



Near-Optimal Evasion of Convex-Inducing Classifiers 



Algorithm 3.8. Convex Xj Set Search 
SetSearch (V, Q = {x J G V}, C~,C + ,e) 
while C~/C + > 1 + e do begin 

C «- Vc- • C+ 

[x*,P', Q'] <- IntersectSearch (V, Q, C) 
if intersection found then begin 
Let C~ <- ,4 (x*) 
V and Q «- Q' 

else 

C + C* 
end if 
end while 
Return: x* 



3.2.3 Optimization over £j Balls 

Here we suggest improvements for l\ minimization us- 
ing iterative IntersectSearch and present them as 
SetSearch in Algorithm |3.8[ 

First, since x A , x and Q are the same for every 
iteration of the optimization procedure, we only run 
the RoundingBody procedure once as a preprocess- 
ing step. The set of samples {x J e V } it produces 
are sufficient to initialize IntersectSearch at each 
stage of the binary search. Second, the separating hy- 
perplane for point y given by Eq. ^ is valid for 
all weighted -balls of cost C < A (y). Thus, the final 
state from a successful call to IntersectSearch can 
be used as the starting state for the subsequent call to 
IntersectSearch. 

4 CONCLUSIONS & FUTURE 
WORK 

The analysis of our algorithms shows that J? convcx is 
e-IMAC searchable for weighted t\ costs. When the 
positive class is convex we give efficient techniques that 
outperform previous reverse-engineering approaches 
for linear classifiers. When the negative class is convex, 
we apply a randomized Ellipsoid method to achieve ef- 
ficient e-IMAC search. If the adversary is unaware 
of which set is convex, they can trivially run both 
searches to discover an e-IMAC with a combined poly- 
nomial query complexity. 

Exploring near-optimal evasion is important for under- 
standing how an adversary may circumvent learners 
in security-sensitive settings. As described here, our 
algorithms may not always directly apply in practice 
since various real- world obstacles persist. Queries may 
be only partially observable or noisy and the feature 
set may be only partially known. Moreover, an ad- 
versary may not be able to query all x £ X. Queries 
must be objects (such as email) that are mapped into 
X. A real- world adversary must invert the feature- 
mapping — a generally difficult task. These limitations 



necessitate further research on the impact of partial 
observability and approximate querying on e-IMAC 
search, and to design more secure filters. Broader open 
problems include: is e-IMAC search possible on other 
classes of learners such as SVMs (linear in a large pos- 
sibly infinite feature space)? Is e-IMAC search feasible 
against an online learner that adapts as it is queried? 
Can learners be made resilient to these threats and 
how does this impact learning performance? 

References 

Dimitris Bertsimas and Santosh Vempala. Solving convex 
programs by random walks. J. ACM, 51(4):540-556, 
2004. 

Christopher M. Bishop. Pattern Recognition and Machine 
Learning. Springer, 2006. 

Stephen Boyd and Lieven Vandenberghe. Convex Opti- 
mization. Cambridge University Press, 2004. 

Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, 
and Deepak Verma. Adversarial classification. In Proc. 
KDD'04, pages 99-108, 2004. 

Anukool Lakhina, Mark Crovella, and Christophe Diot. Di- 
agnosing network-wide traffic anomalies. In Proc. SIG- 
COMM'04, pages 219-230, 2004. 

Laszlo Lovasz and Santosh Vempala. Simulated annealing 
in convex bodies and an 0*(n 4 ) volume algorithm. In 
Proc. FOCS'03, 2003. 

Laszlo Lovasz and Santosh Vempala. Hit-and-run from a 
corner. In Proc. STOC'04, pages 310-314, 2004. 

Daniel Lowd and Christopher Meek. Adversarial learning. 
In Proc. KDD'05, pages 641-647, 2005. 

Luis Rademacher and Navin Goyal. Learning convex bod- 
ies is hard. In Proc. COLT'09, pages 303-308, 2009. 

Greg Schohn and David Cohn. Less is more: Active learn- 
ing with support vector machines. In Proc. ICML'00, 
2000. 

Robert L. Smith. The hit-and-run sampler: a globally 
reaching Markov chain sampler for generating arbitrary 
multivariate distributions. In Proc. WSC'96, pages 260- 
264, 1996. 

Kymie M. C. Tan, Kevin S. Killourhy, and Roy A. Maxion. 
Undermining an anomaly-based intrusion detection sys- 
tem using common exploits. In Proc. RAID'02, pages 
54-73, 2002. 

David Wagner and Paolo Soto. Mimicry attacks on host- 
based intrusion detection systems. In Proc. CCS'02, 
pages 255-264, 2002. 



